Region-level Tracking for Scalable Directory Cache
نویسندگان
چکیده
Traditional coherence directories track sharing information at a cache-line granularity. In practice, however, as data sharing occurs at a coarser granularity in a large region of memory, common sharing patterns tend to be observed across multiple proximate lines. Hence, the directory entries for the lines replicate the same sharing information, resulting in inefficient use of space, power, and energy. In this paper, we empirically demonstrate “region-level sharing pattern locality”, that is, a small number of distinct sharing patterns are observed across proximate lines within a large region of memory, e.g., a page unit. We leverage this phenomenon to propose a new representation of sharing information, called Region-level Sharing information Tracking (RST), that dynamically maintains common sharing information in a space-efficient manner at a region-level. Our experimental results based on conventional parallel and server workloads show that RST reduces over 75% of the area (and hence energy) compared to conventional directory caches, with almost negligible performance overhead.
منابع مشابه
C-AMTE: A location mechanism for flexible cache management in chip multiprocessors
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechanism to facilitate flexible and efficient distributed cache management in large-scale chip multiprocessors (CMPs). C-AMTE enables fast locating of cache blocks in CMP cache schemes that employ one-to-one or one-to-many associative mappings. C-AMTE stores in percore data structures tracking entries...
متن کاملFusion Coherence: Scalable Cache Coherence for Heterogeneous Kilo-Core System
Future heterogeneous systems will integrate CPUs and GPUs on a single chip to achieve high computing performance as well as high throughput. In general, it would discard the current discrete pattern and will build a uniformed shared memory system avoiding explicit data movement among CPUs and GPUs connected by high throughput NoC. We propose a scalable cache coherence solution Fusion Coherence ...
متن کاملLocality-oblivious cache organization leveraging single- cycle multi-hop NoCs Citation
Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-away caches has in the past been more costly than accessing nearby ones. Substantial research on locality-aware designs have thus focused on keeping a copy of the data private. However, this complicates the problem of data tracking and search/invalidation; tracking the state of a line at all on-chi...
متن کاملEecient Implementation of Cache Coherence in Scalable Shared Memory Multiprocessors
The cache coherence scheme for a scalable distributed shared memory multiproces-sor should be eecient in terms of memory overhead for maintaining the directories, as well as network latency for a memory request. In this paper, we propose a cache coherence scheme which minimizes the memory access delay and at the same time, reduces the directory overhead by using a limited directory scheme. In t...
متن کاملUsing Inflight Chains To Build A Scalable Cache Coherence Protocol Using In-flight Chains to Build a Scalable Cache Coherence Protocol SAMANTIKA SUBRAMANIAM, INTEL CORPORATION SIMON C. STEELY, INTEL CORPORATION WILL HASENPLAUGH, INTEL CORPORATION and MIT
As microprocessor designs integrate more cores, scalability of cache coherence protocols becomes a challenging problem. Most directory-based protocols avoid races by using blocking tag-directories which can impact the performance of parallel applications. In this paper we first quantitatively demonstrate that state-of-the-art blocking protocols significantly constrain throughput at large core c...
متن کامل